Markovian language model of the DNA and its information content
نویسندگان
چکیده
This work proposes a Markovian memoryless model for the DNA that simplifies enormously the complexity of it. We encode nucleotide sequences into symbolic sequences, called words, from which we establish meaningful length of words and groups of words that share symbolic similarities. Interpreting a node to represent a group of similar words and edges to represent their functional connectivity allows us to construct a network of the grammatical rules governing the appearance of groups of words in the DNA. Our model allows us to predict the transition between groups of words in the DNA with unprecedented accuracy, and to easily calculate many informational quantities to better characterize the DNA. In addition, we reduce the DNA of known bacteria to a network of only tens of nodes, show how our model can be used to detect similar (or dissimilar) genes in different organisms, and which sequences of symbols are responsible for most of the information content of the DNA. Therefore, the DNA can indeed be treated as a language, a Markovian language, where a 'word' is an element of a group, and its grammar represents the rules behind the probability of transitions between any two groups.
منابع مشابه
A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks
The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...
متن کاملRandomized Algorithm For 3-Set Splitting Problem and it's Markovian Model
In this paper we restrict every set splitting problem to the special case in which every set has just three elements. This restricted version is also NP-complete. Then, we introduce a general conversion from any set splitting problem to 3-set splitting. Then we introduce a randomize algorithm, and we use Markov chain model for run time complexity analysis of this algorithm. In the last section ...
متن کاملMonte Carlo Simulation to Compare Markovian and Neural Network Models for Reliability Assessment in Multiple AGV Manufacturing System
We compare two approaches for a Markovian model in flexible manufacturing systems (FMSs) using Monte Carlo simulation. The model which is a development of Fazlollahtabar and Saidi-Mehrabad (2013), considers two features of automated flexible manufacturing systems equipped with automated guided vehicle (AGV) namely, the reliability of machines and the reliability of AGVs in a multiple AGV jobsho...
متن کاملNoospheric Psychological-Educational Paradigm as a Methodological Basis for Teaching Russian-Language Business Communication to Foreign Students
In the context of the polyparadigmatic system of higher education, the noospheric psychological-pedagogical paradigm is considered, on its basis a lingvodidactic model is developed for the formation of professional-communicative competence (PCC) in Russian-language business communication among foreign students. The research focuses on the basic principles of the noospheric paradigm, which procl...
متن کاملAttitudes towards English as an International Language (EIL) in Iran: Development and Validation of a New Model and Questionnaire
This study aimed at developing and validating a new model and instrument to explore attitudes of Iranian EFL learners towards English as an International Language (EIL). In so doing, the researchers followed several rigorous steps including extensive literature review, content selection, item generation, designing the rating scales and personal information part, Delphi technique, item revision,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 3 شماره
صفحات -
تاریخ انتشار 2016